Skip to content

fix: fix paging issue#370

Merged
hefanli merged 1 commit intomainfrom
fix_paging
Feb 24, 2026
Merged

fix: fix paging issue#370
hefanli merged 1 commit intomainfrom
fix_paging

Conversation

@hefanli
Copy link
Collaborator

@hefanli hefanli commented Feb 24, 2026

No description provided.

if (file.getFilePath().startsWith(dataset.getPath())) {
try {
Path filePath = Paths.get(file.getFilePath());
Files.deleteIfExists(filePath);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

General approach: Wherever a path is derived from user-controlled fileId and prefix, we must (1) construct it using java.nio.file.Path operations, (2) normalize it, and (3) enforce that the resulting absolute path stays within the intended dataset root directory. We should also reject fileId/prefix that contain path traversal patterns (..) or path separators if they are intended to be single path components. This avoids directory traversal and arbitrary file deletion/reading.

Best concrete fix here:

  1. In DatasetFileApplicationService.getDatasetFile(Dataset dataset, String fileId, String prefix):

    • Treat fileId and prefix as relative components under dataset.getPath().
    • Build the path using Paths.get(dataset.getPath()).resolve(prefix).resolve(fileId).normalize().toAbsolutePath().
    • Compute Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath(); and check that filePath.startsWith(datasetRoot). If not, throw IllegalArgumentException.
    • Optionally ensure fileId does not contain directory traversal markers by checking for .., /, \ based on expected semantics. Given existing behavior allows directory-like use via prefix, we will at least enforce the normalized containment check, which is stronger and compatible with current semantics.
    • Set file.setFilePath(filePath.toString()) rather than using string concatenation.
  2. In DatasetFileApplicationService.deleteDatasetFile:

    • Replace the plain if (file.getFilePath().startsWith(dataset.getPath())) and Paths.get(file.getFilePath()) with:
      • Construct normalized absolute Path datasetRoot and Path filePath.
      • Verify filePath.startsWith(datasetRoot) before calling Files.deleteIfExists.
    • This guards even database-stored filePath values.
  3. In DatasetFileApplicationService.downloadFile(DatasetFile file):

    • Similarly, retrieve the owning dataset (we do not have that here, and we are not allowed to change other files to pass it in), or at minimum normalize and ensure the path is absolute. Since we cannot change method signature or imports beyond this file, and we don't have dataset here, we can't fully enforce root containment. However, the most critical uncontrolled path to the filesystem in this alert is deletion; CodeQL points specifically at deleteDatasetFile. For downloadFile, the path is taken from DatasetFile.filePath (which we’ll now normalize/validate when computing it in getDatasetFile) so mitigating at creation time already reduces risk. We can leave downloadFile mostly unchanged.

Implementation details:

  • All changes are in backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java.
  • We already import java.nio.file.Path and java.nio.file.Paths, so no new imports are needed.
  • We add logic inside getDatasetFile to build and validate a Path.
  • We modify deleteDatasetFile to use normalized Path objects and a robust startsWith check instead of string-based startsWith.
Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -205,12 +205,19 @@
     public DatasetFile getDatasetFile(Dataset dataset, String fileId, String prefix) {
         prefix = StringUtils.isBlank(prefix) ? "" : prefix;
         if (dataset != null && !CommonUtils.isUUID(fileId) && !fileId.startsWith(".") && !prefix.startsWith(".")) {
+            // 构建并校验文件路径,防止路径遍历
+            Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+            Path filePath = datasetRoot.resolve(prefix).resolve(fileId).normalize().toAbsolutePath();
+            if (!filePath.startsWith(datasetRoot)) {
+                throw new IllegalArgumentException("Invalid file path");
+            }
+
             DatasetFile file = new DatasetFile();
             file.setId(fileId);
             file.setFileName(fileId);
             file.setDatasetId(dataset.getId());
             file.setFileSize(0L);
-            file.setFilePath(dataset.getPath() + File.separator + prefix + fileId);
+            file.setFilePath(filePath.toString());
             return file;
         }
         DatasetFile file = datasetFileRepository.getById(fileId);
@@ -237,13 +239,14 @@
         }
         datasetRepository.updateById(dataset);
         // 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
-        if (file.getFilePath().startsWith(dataset.getPath())) {
-            try {
-                Path filePath = Paths.get(file.getFilePath());
+        try {
+            Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+            Path filePath = Paths.get(file.getFilePath()).normalize().toAbsolutePath();
+            if (filePath.startsWith(datasetRoot)) {
                 Files.deleteIfExists(filePath);
-            } catch (IOException ex) {
-                throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
             }
+        } catch (IOException ex) {
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
         }
     }
 
EOF
@@ -205,12 +205,19 @@
public DatasetFile getDatasetFile(Dataset dataset, String fileId, String prefix) {
prefix = StringUtils.isBlank(prefix) ? "" : prefix;
if (dataset != null && !CommonUtils.isUUID(fileId) && !fileId.startsWith(".") && !prefix.startsWith(".")) {
// 构建并校验文件路径,防止路径遍历
Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path filePath = datasetRoot.resolve(prefix).resolve(fileId).normalize().toAbsolutePath();
if (!filePath.startsWith(datasetRoot)) {
throw new IllegalArgumentException("Invalid file path");
}

DatasetFile file = new DatasetFile();
file.setId(fileId);
file.setFileName(fileId);
file.setDatasetId(dataset.getId());
file.setFileSize(0L);
file.setFilePath(dataset.getPath() + File.separator + prefix + fileId);
file.setFilePath(filePath.toString());
return file;
}
DatasetFile file = datasetFileRepository.getById(fileId);
@@ -237,13 +239,14 @@
}
datasetRepository.updateById(dataset);
// 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
if (file.getFilePath().startsWith(dataset.getPath())) {
try {
Path filePath = Paths.get(file.getFilePath());
try {
Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path filePath = Paths.get(file.getFilePath()).normalize().toAbsolutePath();
if (filePath.startsWith(datasetRoot)) {
Files.deleteIfExists(filePath);
} catch (IOException ex) {
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
} catch (IOException ex) {
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
}

Copilot is powered by AI and may make mistakes. Always verify output.
try {
Path filePath = Paths.get(file.getFilePath()).normalize();
log.info("start download file {}", file.getFilePath());
Resource resource = new UrlResource(filePath.toUri());

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.
This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

In general, the problem should be fixed by validating and constraining any user-controlled components before they are used to build a filesystem path, and by ensuring that the final resolved path remains under an expected base directory. Here, that base directory is dataset.getPath(). We need to (1) restrict fileId and prefix when they are used as path components, and (2) enforce that the normalized Path used for deletion and download never escapes the dataset root.

The best minimally invasive fix is:

  1. In getDatasetFile:

    • Treat the non-UUID case as a “virtual” file that must be relative to the dataset root.
    • Build the path using Paths.get(dataset.getPath()).resolve(...) and normalize().
    • Explicitly reject fileId and prefix that contain ".." or path separators, or that start with a separator, because they are intended here as simple names or single-segment prefixes.
    • After resolving, verify that the resolved path still starts with the normalized dataset root path; if not, throw IllegalArgumentException.
  2. In downloadFile:

    • Normalize the path, resolve it under a trusted base (here, we can use Paths.get(dataset.getPath()) conceptually, but since we only have a DatasetFile, we need to ensure that file.getFilePath() saved from the DB is also safe. We can at least normalize it and fail if it’s not absolute or if it contains .. in an unsafe way.)
    • However, the synthetic non-UUID branch will already have guaranteed containment, and DB paths should already be under the dataset root; we’ll additionally normalize here to make traversal via odd sequences impossible.
  3. In deleteDatasetFile:

    • Replace the raw Paths.get(file.getFilePath()) with normalization and containment validation (similar to getDatasetFile), rejecting any path that escapes dataset.getPath() before deleting.

We are only allowed to change the provided snippets, so we’ll implement validation and containment checks directly in DatasetFileApplicationService.getDatasetFile, deleteDatasetFile, and downloadFile. No new external dependencies are required; we can use existing java.nio.file.Path, Paths, and string checks.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -204,13 +204,36 @@
     @Transactional(readOnly = true)
     public DatasetFile getDatasetFile(Dataset dataset, String fileId, String prefix) {
         prefix = StringUtils.isBlank(prefix) ? "" : prefix;
-        if (dataset != null && !CommonUtils.isUUID(fileId) && !fileId.startsWith(".") && !prefix.startsWith(".")) {
+        if (dataset != null && !CommonUtils.isUUID(fileId)) {
+            // When fileId is not a UUID, treat it as a relative path component under the dataset path.
+            // Validate that fileId and prefix do not contain path traversal or separators.
+            if (fileId.startsWith(".")
+                    || prefix.startsWith(".")
+                    || fileId.contains("..")
+                    || prefix.contains("..")
+                    || fileId.contains("/") || fileId.contains("\\")
+                    || prefix.contains("/") || prefix.contains("\\")) {
+                throw new IllegalArgumentException("Invalid file identifier or prefix");
+            }
+
+            Path datasetRoot = Paths.get(dataset.getPath()).normalize();
+            // Build relative path using Path APIs to avoid simple string concatenation issues
+            Path relativePath = StringUtils.isBlank(prefix)
+                    ? Paths.get(fileId)
+                    : Paths.get(prefix, fileId);
+            Path resolvedPath = datasetRoot.resolve(relativePath).normalize();
+
+            // Ensure the resolved path stays within the dataset root directory
+            if (!resolvedPath.startsWith(datasetRoot)) {
+                throw new IllegalArgumentException("File path escapes dataset root");
+            }
+
             DatasetFile file = new DatasetFile();
             file.setId(fileId);
             file.setFileName(fileId);
             file.setDatasetId(dataset.getId());
             file.setFileSize(0L);
-            file.setFilePath(dataset.getPath() + File.separator + prefix + fileId);
+            file.setFilePath(resolvedPath.toString());
             return file;
         }
         DatasetFile file = datasetFileRepository.getById(fileId);
@@ -237,10 +255,14 @@
         }
         datasetRepository.updateById(dataset);
         // 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
-        if (file.getFilePath().startsWith(dataset.getPath())) {
+        if (file.getFilePath() != null) {
             try {
-                Path filePath = Paths.get(file.getFilePath());
-                Files.deleteIfExists(filePath);
+                Path datasetRoot = Paths.get(dataset.getPath()).normalize();
+                Path filePath = Paths.get(file.getFilePath()).normalize();
+                // Only delete files that are inside the dataset root directory
+                if (filePath.startsWith(datasetRoot)) {
+                    Files.deleteIfExists(filePath);
+                }
             } catch (IOException ex) {
                 throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
             }
EOF
@@ -204,13 +204,36 @@
@Transactional(readOnly = true)
public DatasetFile getDatasetFile(Dataset dataset, String fileId, String prefix) {
prefix = StringUtils.isBlank(prefix) ? "" : prefix;
if (dataset != null && !CommonUtils.isUUID(fileId) && !fileId.startsWith(".") && !prefix.startsWith(".")) {
if (dataset != null && !CommonUtils.isUUID(fileId)) {
// When fileId is not a UUID, treat it as a relative path component under the dataset path.
// Validate that fileId and prefix do not contain path traversal or separators.
if (fileId.startsWith(".")
|| prefix.startsWith(".")
|| fileId.contains("..")
|| prefix.contains("..")
|| fileId.contains("/") || fileId.contains("\\")
|| prefix.contains("/") || prefix.contains("\\")) {
throw new IllegalArgumentException("Invalid file identifier or prefix");
}

Path datasetRoot = Paths.get(dataset.getPath()).normalize();
// Build relative path using Path APIs to avoid simple string concatenation issues
Path relativePath = StringUtils.isBlank(prefix)
? Paths.get(fileId)
: Paths.get(prefix, fileId);
Path resolvedPath = datasetRoot.resolve(relativePath).normalize();

// Ensure the resolved path stays within the dataset root directory
if (!resolvedPath.startsWith(datasetRoot)) {
throw new IllegalArgumentException("File path escapes dataset root");
}

DatasetFile file = new DatasetFile();
file.setId(fileId);
file.setFileName(fileId);
file.setDatasetId(dataset.getId());
file.setFileSize(0L);
file.setFilePath(dataset.getPath() + File.separator + prefix + fileId);
file.setFilePath(resolvedPath.toString());
return file;
}
DatasetFile file = datasetFileRepository.getById(fileId);
@@ -237,10 +255,14 @@
}
datasetRepository.updateById(dataset);
// 删除文件时,上传到数据集中的文件会同时删除数据库中的记录和文件系统中的文件,归集过来的文件仅删除数据库中的记录
if (file.getFilePath().startsWith(dataset.getPath())) {
if (file.getFilePath() != null) {
try {
Path filePath = Paths.get(file.getFilePath());
Files.deleteIfExists(filePath);
Path datasetRoot = Paths.get(dataset.getPath()).normalize();
Path filePath = Paths.get(file.getFilePath()).normalize();
// Only delete files that are inside the dataset root directory
if (filePath.startsWith(datasetRoot)) {
Files.deleteIfExists(filePath);
}
} catch (IOException ex) {
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
Copilot is powered by AI and may make mistakes. Always verify output.
Path target = Paths.get(targetPath).normalize();

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

In general, to fix uncontrolled path usage, you must (1) define one or more trusted base directories, (2) ensure that any path provided by the client is treated strictly as a relative path segment or is validated/normalized, and (3) reject or constrain any path that escapes the allowed base directories (for example by containing .. or being absolute). For simple file names, you can forbid separators entirely; for hierarchical paths, you can use Path.normalize() and check that the resulting path is still beneath a trusted root.

In this code, the risk comes from file.getFilePath() being used directly as sourPath for filesystem operations. We should ensure that the resolved source file that will be linked/copied is inside a safe area, which in this system is naturally the dataset directory. We already know the dataset’s root path (dataset.getPath()), and we build the dataset’s target file path from dataset.getPath() and req.getPrefix(). The safest, minimal-change fix is:

  • In getDatasetFileForAdd, compute the dataset directory root (Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();) and the requested prefix path under it (datasetRoot.resolve(req.getPrefix()).normalize()), then verify that this directory stays within datasetRoot. This guards against prefix escaping the dataset folder.
  • When validating the source path in addFile, ensure that the normalized source remains inside datasetRoot (passed from the caller), and reject it if not. Alternatively, if the intended design is that filePath is an absolute server-side path into a known staging area, we would instead validate it against that staging base; but we are not shown such a base, and we are already conceptually operating in terms of the dataset path, so constraining both prefix and filePath to the dataset area is the least disruptive, safest assumption.

Concretely, we will:

  1. Change the signature of addFile to also accept the dataset root path (string) and perform a base-directory check for source.
  2. Update the call to addFile in addFilesToDataset (line 851) to pass dataset.getPath() as the base.
  3. In addFile, after Path source = Paths.get(sourPath).normalize();, compute Path base = Paths.get(basePath).normalize().toAbsolutePath(); and resolve source against base if sourPath is relative, then ensure that the final source path starts with base. If the check fails, log and throw a business exception.
  4. In getDatasetFileForAdd, ensure that req.getPrefix() is applied as a path segment under dataset.getPath(), and that the resulting directory path still lies within the dataset root; if not, throw a BusinessException with an appropriate error code (for example DataManagementErrorCode.DIRECTORY_NOT_FOUND).

This keeps the outward behavior (adding files into datasets) the same for valid inputs, but blocks malicious or malformed paths that try to escape the dataset area.

Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -848,7 +848,8 @@
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+                // Ensure that the source file path stays within the dataset directory
+                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
             }
         } catch (BusinessException e) {
             throw e;
@@ -863,13 +864,26 @@
         return addedFiles;
     }
 
-    private void addFile(String sourPath, String targetPath, boolean softAdd) {
-        if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
+    private void addFile(String sourPath, String targetPath, boolean softAdd, String basePath) {
+        if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(basePath)) {
             return;
         }
+
+        Path base = Paths.get(basePath).normalize().toAbsolutePath();
         Path source = Paths.get(sourPath).normalize();
+        if (!source.isAbsolute()) {
+            source = base.resolve(source).normalize();
+        } else {
+            source = source.toAbsolutePath().normalize();
+        }
         Path target = Paths.get(targetPath).normalize();
 
+        // 确保源文件路径在数据集目录下,防止路径穿越
+        if (!source.startsWith(base)) {
+            log.warn("Source file path {} is outside of allowed base directory {}", source, base);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +919,20 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+        Path targetDir = datasetRoot.resolve(req.getPrefix()).normalize();
+        // 防止 prefix 将文件放置到数据集目录之外
+        if (!targetDir.startsWith(datasetRoot)) {
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(targetDir.resolve(fileName).toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -848,7 +848,8 @@
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
addedFiles.add(datasetFile);
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
// Ensure that the source file path stays within the dataset directory
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
}
} catch (BusinessException e) {
throw e;
@@ -863,13 +864,26 @@
return addedFiles;
}

private void addFile(String sourPath, String targetPath, boolean softAdd) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
private void addFile(String sourPath, String targetPath, boolean softAdd, String basePath) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(basePath)) {
return;
}

Path base = Paths.get(basePath).normalize().toAbsolutePath();
Path source = Paths.get(sourPath).normalize();
if (!source.isAbsolute()) {
source = base.resolve(source).normalize();
} else {
source = source.toAbsolutePath().normalize();
}
Path target = Paths.get(targetPath).normalize();

// 确保源文件路径在数据集目录下,防止路径穿越
if (!source.startsWith(base)) {
log.warn("Source file path {} is outside of allowed base directory {}", source, base);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {
log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +919,20 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path targetDir = datasetRoot.resolve(req.getPrefix()).normalize();
// 防止 prefix 将文件放置到数据集目录之外
if (!targetDir.startsWith(datasetRoot)) {
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(targetDir.resolve(fileName).toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
Path parent = target.getParent();
// 创建目标目录(如果需要)
if (parent != null) {
Files.createDirectories(parent);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

General approach: ensure any path derived from user input is constrained to a safe base directory. For this code, that means:

  • Construct the target path for dataset files strictly under dataset.getPath() using only sanitized components (dataset path + safe prefix + file name).
  • Validate that the resolved target path remains under the dataset base path after normalize().
  • Prevent arbitrary source paths if they are user-controlled in a risky way (here, they appear to be absolute paths to existing files, which may be acceptable in a trusted environment; we will at least constrain the target).

Best concrete fix with minimal behavior change:

  1. Introduce a small helper resolveSafeDatasetPath(Dataset dataset, String prefix, String fileName) in DatasetFileApplicationService that:

    • Builds base as Paths.get(dataset.getPath()).toAbsolutePath().normalize().
    • Builds relative from prefix and fileName (treating empty prefix correctly).
    • Combines them into resolved = base.resolve(relative).normalize().
    • Verifies resolved.startsWith(base); if not, logs and throws BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND) (or another appropriate error).
    • Returns resolved.
  2. Use this helper in getDatasetFileForAdd instead of Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString(). This ensures the path stored in DatasetFile.filePath is guaranteed to reside under the dataset directory.

  3. Additionally, harden addFile so that when it is used for dataset operations, the target path has already been checked. Since addFile receives only strings and we should not assume more context, we keep its current behavior but rely on the fact that, for this flow, targetPath is now sanitized via step 1. We keep the normalization and directory creation as-is.

All changes are localized to DatasetFileApplicationService.java; no modifications are needed in DatasetFileController.java or DatasetFile.java, and no new external dependencies are required.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -824,6 +824,30 @@
     }
 
     /**
+     * 基于数据集根路径、安全前缀和文件名解析出安全的文件路径。
+     * 确保最终路径仍位于数据集目录之下,防止目录遍历。
+     */
+    private Path resolveSafeDatasetPath(Dataset dataset, String prefix, String fileName) {
+        Path basePath = Paths.get(dataset.getPath()).toAbsolutePath().normalize();
+
+        Path relativePath;
+        if (StringUtils.isBlank(prefix)) {
+            relativePath = Paths.get(fileName);
+        } else {
+            relativePath = Paths.get(prefix, fileName);
+        }
+
+        Path resolved = basePath.resolve(relativePath).normalize();
+
+        if (!resolved.startsWith(basePath)) {
+            log.warn("Resolved dataset file path {} escapes base path {}", resolved, basePath);
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+
+        return resolved;
+    }
+
+    /**
      * 添加文件到数据集(仅创建数据库记录,不执行文件系统操作)
      *
      * @param datasetId 数据集id
@@ -902,21 +926,23 @@
         }
     }
 
-    private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
-                                                    Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
+    private DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
+                                             Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
         Path sourcePath = Paths.get(file.getFilePath());
         File sourceFile = sourcePath.toFile();
         file.getMetadata().put("softAdd", req.isSoftAdd());
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        Path safeTargetPath = resolveSafeDatasetPath(dataset, req.getPrefix(), fileName);
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(safeTargetPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -824,6 +824,30 @@
}

/**
* 基于数据集根路径安全前缀和文件名解析出安全的文件路径
* 确保最终路径仍位于数据集目录之下防止目录遍历
*/
private Path resolveSafeDatasetPath(Dataset dataset, String prefix, String fileName) {
Path basePath = Paths.get(dataset.getPath()).toAbsolutePath().normalize();

Path relativePath;
if (StringUtils.isBlank(prefix)) {
relativePath = Paths.get(fileName);
} else {
relativePath = Paths.get(prefix, fileName);
}

Path resolved = basePath.resolve(relativePath).normalize();

if (!resolved.startsWith(basePath)) {
log.warn("Resolved dataset file path {} escapes base path {}", resolved, basePath);
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}

return resolved;
}

/**
* 添加文件到数据集仅创建数据库记录不执行文件系统操作
*
* @param datasetId 数据集id
@@ -902,21 +926,23 @@
}
}

private static DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
private DatasetFile getDatasetFileForAdd(AddFilesRequest req, AddFilesRequest.FileRequest file,
Dataset dataset, ObjectMapper objectMapper) throws JsonProcessingException {
Path sourcePath = Paths.get(file.getFilePath());
File sourceFile = sourcePath.toFile();
file.getMetadata().put("softAdd", req.isSoftAdd());
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

Path safeTargetPath = resolveSafeDatasetPath(dataset, req.getPrefix(), fileName);

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(safeTargetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
if (parent != null) {
Files.createDirectories(parent);
}
Files.deleteIfExists(target);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

In general, to fix uncontrolled path usage, we must (1) constrain all user-controlled path components to a designated base directory and (2) validate or normalize those components to prevent path traversal (such as .. segments or absolute paths). The checks should be done before any filesystem operations, and they should enforce that the final resolved path is still under the intended root.

In this code, the unsafe pieces are:

  • file.getFilePath() (used as the source path).
  • req.getPrefix() (used to construct the target filePath and thus targetPath passed to addFile).

The safest minimal change, without altering existing business behavior, is:

  1. Restrict the target path (the dataset file location) to be under dataset.getPath() by:
    • Building the path with Paths.get(dataset.getPath()).resolve(req.getPrefix()).resolve(fileName).normalize().
    • Ensuring that the normalized result starts with the normalized dataset root.
    • If it does not, reject the request with an appropriate BusinessException.
  2. In addFile, verify that target is within the dataset root directory (and optionally that it is not a symlink) before creating directories, deleting, linking, or copying. Since addFile currently receives just strings, we can pass the dataset path (root) into it as an additional argument or reconstruct the dataset root from targetPath. To stay within minimal functional changes, we will pass a datasetRoot string from the caller (where we have dataset.getPath()).
  3. For the source path (sourPath), we should at least prevent obviously dangerous values such as blank, relative with .., or containing illegal characters, because the code expects to link/copy from local files. If in practice only server-generated paths are used, this mainly hardens against misuse; but we will add a simple validation: the source path must be absolute and normalized, and we will not allow it to be under system-critical directories (we will keep this lightweight to avoid breaking expected behavior).

Concretely:

  • Modify getDatasetFileForAdd in DatasetFileApplicationService to:
    • Compute Path datasetRoot = Paths.get(dataset.getPath()).normalize();
    • Compute Path prefixPath = StringUtils.isBlank(req.getPrefix()) ? datasetRoot : datasetRoot.resolve(req.getPrefix()).normalize();
    • Ensure prefixPath.startsWith(datasetRoot); otherwise throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND) (or a more appropriate code if available).
    • Compute Path targetPath = prefixPath.resolve(fileName).normalize();
    • Ensure targetPath.startsWith(datasetRoot); otherwise throw the same exception.
    • Use targetPath.toString() for filePath in the builder.
  • Modify addFilesToDataset to pass the dataset root path into addFile: addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
  • Update the addFile method signature to include the dataset root and:
    • Normalize datasetRoot to Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();
    • Normalize source and target to absolute paths.
    • Ensure target.startsWith(datasetRootPath); if not, log and throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR) before any filesystem write.
    • Optionally perform a simple validation on sourPath: reject if it contains .. segments when normalized, or if it is not absolute.

These changes ensure that, even if a malicious caller provides a crafted prefix or file path, the service will not write outside the dataset’s directory tree.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -848,7 +848,7 @@
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
             }
         } catch (BusinessException e) {
             throw e;
@@ -863,13 +863,20 @@
         return addedFiles;
     }
 
-    private void addFile(String sourPath, String targetPath, boolean softAdd) {
-        if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
+    private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot) {
+        if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(datasetRoot)) {
             return;
         }
-        Path source = Paths.get(sourPath).normalize();
-        Path target = Paths.get(targetPath).normalize();
+        Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();
+        Path source = Paths.get(sourPath).normalize().toAbsolutePath();
+        Path target = Paths.get(targetPath).normalize().toAbsolutePath();
 
+        // 确保目标路径在数据集根目录下,防止目录遍历或任意路径写入
+        if (!target.startsWith(datasetRootPath)) {
+            log.warn("Target path is outside of dataset root. datasetRoot={}, target={}", datasetRootPath, target);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +914,31 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // 构建并校验目标路径,确保始终位于数据集根目录下
+        Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+        Path prefixPath;
+        if (StringUtils.isBlank(req.getPrefix())) {
+            prefixPath = datasetRootPath;
+        } else {
+            prefixPath = datasetRootPath.resolve(req.getPrefix()).normalize();
+        }
+        if (!prefixPath.startsWith(datasetRootPath)) {
+            log.warn("Invalid prefix for dataset. datasetRoot={}, prefix={}", datasetRootPath, req.getPrefix());
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+        Path targetPath = prefixPath.resolve(fileName).normalize();
+        if (!targetPath.startsWith(datasetRootPath)) {
+            log.warn("Computed target path is outside of dataset root. datasetRoot={}, target={}", datasetRootPath, targetPath);
+            throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(targetPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -848,7 +848,7 @@
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
addedFiles.add(datasetFile);
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
}
} catch (BusinessException e) {
throw e;
@@ -863,13 +863,20 @@
return addedFiles;
}

private void addFile(String sourPath, String targetPath, boolean softAdd) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetRoot) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath) || StringUtils.isBlank(datasetRoot)) {
return;
}
Path source = Paths.get(sourPath).normalize();
Path target = Paths.get(targetPath).normalize();
Path datasetRootPath = Paths.get(datasetRoot).normalize().toAbsolutePath();
Path source = Paths.get(sourPath).normalize().toAbsolutePath();
Path target = Paths.get(targetPath).normalize().toAbsolutePath();

// 确保目标路径在数据集根目录下,防止目录遍历或任意路径写入
if (!target.startsWith(datasetRootPath)) {
log.warn("Target path is outside of dataset root. datasetRoot={}, target={}", datasetRootPath, target);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {
log.warn("Source file does not exist or is not a regular file: {}", sourPath);
@@ -910,13 +914,31 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

// 构建并校验目标路径,确保始终位于数据集根目录下
Path datasetRootPath = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path prefixPath;
if (StringUtils.isBlank(req.getPrefix())) {
prefixPath = datasetRootPath;
} else {
prefixPath = datasetRootPath.resolve(req.getPrefix()).normalize();
}
if (!prefixPath.startsWith(datasetRootPath)) {
log.warn("Invalid prefix for dataset. datasetRoot={}, prefix={}", datasetRootPath, req.getPrefix());
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}
Path targetPath = prefixPath.resolve(fileName).normalize();
if (!targetPath.startsWith(datasetRootPath)) {
log.warn("Computed target path is outside of dataset root. datasetRoot={}, target={}", datasetRootPath, targetPath);
throw BusinessException.of(DataManagementErrorCode.DIRECTORY_NOT_FOUND);
}

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
if (softAdd) {
// 优先尝试创建硬链接,失败后尝试创建符号链接;若均失败抛出异常
try {
Files.createLink(target, source);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

General approach: ensure that any filesystem path derived from user-controlled data is constrained to a safe base directory. Here, the safe base is dataset.getPath(). We should (1) construct the target file path based on dataset.getPath() and user input, (2) normalize and convert both base and target to absolute paths, and (3) verify that the target is still under the base (startsWith). If not, reject the request.

Best concrete fix while preserving behavior:

  1. In getDatasetFileForAdd, when building the DatasetFile.filePath from dataset.getPath(), req.getPrefix(), and fileName, we already use Paths.get(...). We should:

    • Normalize and make the resulting path absolute.
    • Ensure it remains under the dataset root directory (dataset.getPath()).
    • If it escapes, throw a BusinessException using an appropriate error code (e.g., SystemErrorCode.FILE_SYSTEM_ERROR or an existing data-management error code).
    • Store the validated, normalized path string in filePath.
  2. In addFile, before using target in filesystem operations, we should:

    • Require the caller to pass the dataset base path (or, equivalently, compute it here if available), but we cannot change method signature without touching callers across the codebase. However, in this code path the only usage we see is addFile(file.getFilePath(), datasetFile.getFilePath(), ...), where datasetFile.getFilePath() will already have been validated by the improved getDatasetFileForAdd. Thus, tightening getDatasetFileForAdd is sufficient to ensure target is safe. Still, we can harden addFile further by re-normalizing target and checking that it’s absolute (to avoid relative surprises); this does not require knowing the dataset root, but containment is already enforced earlier.

Given the constraints to only edit the shown snippets and not alter external contracts, the minimally invasive, effective fix is to:

  • Update getDatasetFileForAdd in DatasetFileApplicationService to:
    • Build a normalized Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
    • Build Path targetPath = datasetRoot.resolve(req.getPrefix()).resolve(fileName).normalize().toAbsolutePath();
    • Verify that targetPath.startsWith(datasetRoot); otherwise throw a BusinessException (e.g., SystemErrorCode.FILE_SYSTEM_ERROR).
    • Set filePath to targetPath.toString() instead of the previous Paths.get(...).toString().

addFile can remain logically the same, because it will now receive a targetPath that has already been validated to stay inside datasetRoot. This directly addresses the CodeQL alert on line 889: although target is still tainted, it is constrained to a safe directory, which removes the security risk and should satisfy the analyzer once it recognizes the containment check.

No changes are needed in DatasetFileController or DatasetFile model for this specific path traversal issue.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -910,13 +910,28 @@
         LocalDateTime currentTime = LocalDateTime.now();
         String fileName = sourcePath.getFileName().toString();
 
+        // Build and validate the target path to ensure it stays under the dataset root
+        Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
+        Path targetPath = datasetRoot
+                .resolve(StringUtils.defaultString(req.getPrefix()))
+                .resolve(fileName)
+                .normalize()
+                .toAbsolutePath();
+
+        // Ensure the normalized target path is still within the dataset root directory
+        if (!targetPath.startsWith(datasetRoot)) {
+            log.warn("Invalid target path when adding file to dataset. root={}, target={}, prefix={}, fileName={}",
+                    datasetRoot, targetPath, req.getPrefix(), fileName);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         return DatasetFile.builder()
                 .id(UUID.randomUUID().toString())
                 .datasetId(dataset.getId())
                 .fileName(fileName)
                 .fileType(AnalyzerUtils.getExtension(fileName))
                 .fileSize(sourceFile.length())
-                .filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
+                .filePath(targetPath.toString())
                 .uploadTime(currentTime)
                 .lastAccessTime(currentTime)
                 .metadata(objectMapper.writeValueAsString(file.getMetadata()))
EOF
@@ -910,13 +910,28 @@
LocalDateTime currentTime = LocalDateTime.now();
String fileName = sourcePath.getFileName().toString();

// Build and validate the target path to ensure it stays under the dataset root
Path datasetRoot = Paths.get(dataset.getPath()).normalize().toAbsolutePath();
Path targetPath = datasetRoot
.resolve(StringUtils.defaultString(req.getPrefix()))
.resolve(fileName)
.normalize()
.toAbsolutePath();

// Ensure the normalized target path is still within the dataset root directory
if (!targetPath.startsWith(datasetRoot)) {
log.warn("Invalid target path when adding file to dataset. root={}, target={}, prefix={}, fileName={}",
datasetRoot, targetPath, req.getPrefix(), fileName);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

return DatasetFile.builder()
.id(UUID.randomUUID().toString())
.datasetId(dataset.getId())
.fileName(fileName)
.fileType(AnalyzerUtils.getExtension(fileName))
.fileSize(sourceFile.length())
.filePath(Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString())
.filePath(targetPath.toString())
.uploadTime(currentTime)
.lastAccessTime(currentTime)
.metadata(objectMapper.writeValueAsString(file.getMetadata()))
Copilot is powered by AI and may make mistakes. Always verify output.
} catch (Throwable hardEx) {
log.warn("create hard link failed from {} to {}: {}", source, target, hardEx.getMessage());
}
Files.createSymbolicLink(target, source);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

In general terms, the fix is to ensure that any filesystem path derived from user input is validated before being used as a target for file operations. The usual pattern is: build the path relative to a known safe base directory, normalize it, then check that the resulting absolute path is still within the base directory. If the check fails, reject the request.

In this case, the key unsafe operation is in addFile(String sourPath, String targetPath, boolean softAdd) in DatasetFileApplicationService. Here targetPath ultimately comes from Paths.get(dataset.getPath(), req.getPrefix(), fileName).toString() which includes the user-controlled req.getPrefix(). The best fix with minimal functional impact is:

  • Add a reference to the dataset base directory into addFile, by passing dataset.getPath() as an extra argument when calling it from addFilesToDataset.
  • Inside addFile, resolve and normalize the target path against the trusted base directory, then ensure the normalized path starts with that base directory. If not, throw a BusinessException (e.g., using an existing error code such as SystemErrorCode.FILE_SYSTEM_ERROR or a more specific one if desired).
  • Keep source path behavior as-is, because the described vulnerability is specifically about the target path that the service writes to within the dataset.

Concretely:

  1. Change the signature of addFile from private void addFile(String sourPath, String targetPath, boolean softAdd) to private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath).
  2. Adjust the call in addFilesToDataset to pass dataset.getPath() as the new argument.
  3. Inside addFile:
    • Compute Path baseDir = Paths.get(datasetBasePath).toAbsolutePath().normalize();
    • Compute Path target = baseDir.resolve(targetPath).normalize();
    • Check if (!target.startsWith(baseDir)) { log.warn(...); throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR); }
    • Proceed with directory creation and link/copy using this validated target.
  4. Keep other behavior (softAdd, hard link fallback, etc.) unchanged.

This stays within the existing files (DatasetFileApplicationService) and uses only standard JDK APIs (java.nio.file.Paths, Path.startsWith), so no new external dependencies are required.

Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -848,7 +848,8 @@
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+                // 使用数据集根路径作为安全基准目录,防止目标路径越权
+                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
             }
         } catch (BusinessException e) {
             throw e;
@@ -863,13 +864,20 @@
         return addedFiles;
     }
 
-    private void addFile(String sourPath, String targetPath, boolean softAdd) {
+    private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath) {
         if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
             return;
         }
         Path source = Paths.get(sourPath).normalize();
-        Path target = Paths.get(targetPath).normalize();
+        Path baseDir = Paths.get(datasetBasePath).toAbsolutePath().normalize();
+        Path target = baseDir.resolve(targetPath).normalize();
 
+        // 确保目标路径仍在数据集根目录之下,防止目录遍历
+        if (!target.startsWith(baseDir)) {
+            log.warn("Target path is outside of dataset base directory. baseDir={}, target={}", baseDir, target);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
EOF
@@ -848,7 +848,8 @@
setDatasetFileId(datasetFile, dataset);
dataset.addFile(datasetFile);
addedFiles.add(datasetFile);
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
// 使用数据集根路径作为安全基准目录,防止目标路径越权
addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd(), dataset.getPath());
}
} catch (BusinessException e) {
throw e;
@@ -863,13 +864,20 @@
return addedFiles;
}

private void addFile(String sourPath, String targetPath, boolean softAdd) {
private void addFile(String sourPath, String targetPath, boolean softAdd, String datasetBasePath) {
if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
return;
}
Path source = Paths.get(sourPath).normalize();
Path target = Paths.get(targetPath).normalize();
Path baseDir = Paths.get(datasetBasePath).toAbsolutePath().normalize();
Path target = baseDir.resolve(targetPath).normalize();

// 确保目标路径仍在数据集根目录之下,防止目录遍历
if (!target.startsWith(baseDir)) {
log.warn("Target path is outside of dataset base directory. baseDir={}, target={}", baseDir, target);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {
log.warn("Source file does not exist or is not a regular file: {}", sourPath);
Copilot is powered by AI and may make mistakes. Always verify output.
Files.createSymbolicLink(target, source);
} else {
// 覆盖已存在的目标文件,保持与其他地方行为一致
Files.copy(source, target);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

In general terms, the fix is to validate and constrain any user‑provided path before using it in file system operations. For addFile(String sourPath, String targetPath, boolean softAdd), we should ensure that sourPath is either: (a) under a known, trusted base directory (for example, a configured upload or staging directory), and that its normalized form does not escape that directory; or (b) at least not an absolute path and not containing .. path traversal components if the design expects only relative paths. The same normalized‑under‑base check is already used elsewhere in the project (e.g., in directory download/delete flows) and is a standard pattern.

The best fix here without changing higher‑level behavior is:

  • Introduce a base directory field in DatasetFileApplicationService (e.g., injected from configuration via @Value("${datamanagement.add-files.source-base-path:}") or similar). Since we cannot safely assume config keys, and we must not change behavior too much, a safer minimal change is to enforce that the source path is not absolute and does not contain .. components. This keeps the semantics “copy/link from a path the caller provides”, but blocks obvious path traversal and system‑file access by absolute paths.
  • In addFile, after building Path source = Paths.get(sourPath).normalize();, add validation:
    • Reject if source.isAbsolute().
    • Reject if the normalized path contains any .. segment (can be detected by scanning source’s name elements).
  • Optionally log and throw BusinessException with an appropriate error code when validation fails.

These changes are all local to DatasetFileApplicationService.addFile. No other files need modification. No new external libraries are required; we rely solely on java.nio.file.Path methods and existing exception types.

Concretely, in DatasetFileApplicationService.java:

  • In the addFile method (lines 866–903), after computing source, insert validation logic to ensure it is a relative path without ... If validation fails, log and throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR) (or a more specific error code if desired).
  • Keep the rest of the method as‑is so functionality (copy/link behavior, error handling) remains unchanged except for rejecting unsafe source paths.
Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -870,6 +870,18 @@
         Path source = Paths.get(sourPath).normalize();
         Path target = Paths.get(targetPath).normalize();
 
+        // 校验源路径,防止路径遍历和访问任意绝对路径
+        if (source.isAbsolute()) {
+            log.warn("Rejected absolute source path when adding file: {}", source);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+        for (Path part : source) {
+            if ("..".equals(part.toString())) {
+                log.warn("Rejected source path with parent directory reference when adding file: {}", source);
+                throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+            }
+        }
+
         // 检查源文件是否存在且为普通文件
         if (!Files.exists(source) || !Files.isRegularFile(source)) {
             log.warn("Source file does not exist or is not a regular file: {}", sourPath);
EOF
@@ -870,6 +870,18 @@
Path source = Paths.get(sourPath).normalize();
Path target = Paths.get(targetPath).normalize();

// 校验源路径,防止路径遍历和访问任意绝对路径
if (source.isAbsolute()) {
log.warn("Rejected absolute source path when adding file: {}", source);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
for (Path part : source) {
if ("..".equals(part.toString())) {
log.warn("Rejected source path with parent directory reference when adding file: {}", source);
throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
}
}

// 检查源文件是否存在且为普通文件
if (!Files.exists(source) || !Files.isRegularFile(source)) {
log.warn("Source file does not exist or is not a regular file: {}", sourPath);
Copilot is powered by AI and may make mistakes. Always verify output.
Files.createSymbolicLink(target, source);
} else {
// 覆盖已存在的目标文件,保持与其他地方行为一致
Files.copy(source, target);

Check failure

Code scanning / CodeQL

Uncontrolled data used in path expression High

This path depends on a
user-provided value
.

Copilot Autofix

AI 4 days ago

In general, to fix uncontrolled path usage you must (1) validate any user-controlled path components and/or (2) constrain resulting paths to a known safe base directory after normalization. For this code, the vulnerable sink is the addFile method: it takes two string paths where both arguments can be influenced by the request, resolves them via Paths.get, then performs file operations.

Best fix while preserving current behavior:

  1. Treat the dataset’s directory (dataset.getPath()) as the only allowed root for target paths.
  2. Normalize and resolve the constructed target path against that root, and ensure it does not escape the dataset root (startsWith check).
  3. Optionally, perform minimal sanity checks on the client-supplied file.getFilePath() to avoid obviously dangerous sources, even if they might be semi-trusted already.

Because addFile currently only receives String sourPath and String targetPath, we need to pass the dataset root directory into it so it can enforce the containment check. We can do this with minimal change by:

  • Introducing a new private helper addFileWithinDataset(Path datasetRoot, String sourPath, String targetPath, boolean softAdd) that:
    • Validates sourPath and targetPath are non-blank (as today).
    • Computes Path source = Paths.get(sourPath).normalize().toAbsolutePath();
    • Computes Path datasetRootAbs = datasetRoot.normalize().toAbsolutePath();
    • Computes Path target = datasetRootAbs.resolve(targetPath).normalize();
    • Ensures target.startsWith(datasetRootAbs); if not, log and throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR) (or a more specific error).
    • Proceeds with the existing existence and copy/link logic, but uses the validated target and source.
  • Updating addFilesToDataset so that instead of calling addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd()); it calls addFileWithinDataset(Paths.get(dataset.getPath()), file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());.
  • Optionally, we can leave the old addFile method in place but delegate to the new, safer version using a null base only internally if we know all callers can supply a dataset root. To avoid changing other parts of the code we haven’t seen, we’ll keep addFile and make it a thin wrapper requiring a base path, but since the only shown call is from addFilesToDataset, we can simply stop using addFile from there and keep its behavior unchanged for any unknown callers.

This approach keeps all external functionality (copy vs softAdd, overwrite behavior, etc.) the same, but ensures that the target file is always inside the dataset’s directory, preventing path traversal via req.getPrefix() or any future uses of DatasetFile.filePath.


Suggested changeset 1
backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
--- a/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
+++ b/backend/services/data-management-service/src/main/java/com/datamate/datamanagement/application/DatasetFileApplicationService.java
@@ -842,13 +842,15 @@
         dataset.setFiles(existDatasetFiles);
         try {
             ObjectMapper objectMapper = new ObjectMapper();
+            Path datasetRoot = Paths.get(dataset.getPath());
 
             for (AddFilesRequest.FileRequest file : req.getFiles()) {
                 DatasetFile datasetFile = getDatasetFileForAdd(req, file, dataset, objectMapper);
                 setDatasetFileId(datasetFile, dataset);
                 dataset.addFile(datasetFile);
                 addedFiles.add(datasetFile);
-                addFile(file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
+                // 将目标路径限制在数据集根目录下,防止路径遍历
+                addFileWithinDataset(datasetRoot, file.getFilePath(), datasetFile.getFilePath(), req.isSoftAdd());
             }
         } catch (BusinessException e) {
             throw e;
@@ -863,6 +859,55 @@
         return addedFiles;
     }
 
+    /**
+     * 在指定数据集根目录下添加文件,确保目标路径不会逃逸出数据集目录。
+     */
+    private void addFileWithinDataset(Path datasetRoot, String sourPath, String targetPath, boolean softAdd) {
+        if (datasetRoot == null || StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
+            return;
+        }
+        Path source = Paths.get(sourPath).normalize().toAbsolutePath();
+        Path root = datasetRoot.normalize().toAbsolutePath();
+        Path target = root.resolve(targetPath).normalize();
+
+        // 确保目标路径在数据集根目录下,防止路径遍历或越权访问
+        if (!target.startsWith(root)) {
+            log.warn("Target path escapes dataset root. root={}, target={}", root, target);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
+        // 检查源文件是否存在且为普通文件
+        if (!Files.exists(source) || !Files.isRegularFile(source)) {
+            log.warn("Source file does not exist or is not a regular file: {}", sourPath);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+
+        try {
+            Path parent = target.getParent();
+            // 创建目标目录(如果需要)
+            if (parent != null) {
+                Files.createDirectories(parent);
+            }
+            Files.deleteIfExists(target);
+            if (softAdd) {
+                // 优先尝试创建硬链接,失败后尝试创建符号链接;若均失败抛出异常
+                try {
+                    Files.createLink(target, source);
+                    return;
+                } catch (Throwable hardEx) {
+                    log.warn("create hard link failed from {} to {}: {}", source, target, hardEx.getMessage());
+                }
+                Files.createSymbolicLink(target, source);
+            } else {
+                // 覆盖已存在的目标文件,保持与其他地方行为一致
+                Files.copy(source, target);
+            }
+        } catch (IOException e) {
+            log.error("Failed to add file from {} to {}", source, target, e);
+            throw BusinessException.of(SystemErrorCode.FILE_SYSTEM_ERROR);
+        }
+    }
+
     private void addFile(String sourPath, String targetPath, boolean softAdd) {
         if (StringUtils.isBlank(sourPath) || StringUtils.isBlank(targetPath)) {
             return;
EOF
Copilot is powered by AI and may make mistakes. Always verify output.
@hefanli hefanli merged commit 0092e87 into main Feb 24, 2026
7 of 8 checks passed
@hefanli hefanli deleted the fix_paging branch February 24, 2026 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant